Bayesian Disclosure Risk Assessment: Predicting Small Frequencies in Contingency Tables

نویسندگان

  • Jonathan J Forster
  • Emily L Webb
چکیده

We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focussed on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty into our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Nonparametric Disclosure Risk Estimation via Mixed Effects Log-linear Models

Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk...

متن کامل

Cell Bounds in Two-Way Contingency Tables Based on Conditional Frequencies

Statistical methods for disclosure limitation (or control) have seen coupling of tools from statistical methodologies and operations research. For the summary and release of data in the form of a contingency table some methods have focused on evaluation of bounds on cell entries in k-way tables given the sets of marginal totals, with less focus on evaluation of disclosure risk given other summa...

متن کامل

Statistical Disclosure Limitation with Released Marginals and Conditionals for Contingency Tables

The goal of statistical disclosure limitation is to develop methods and tools that while preserving confidentiality can provide access to useful statistical data, not just a few numbers. In this paper we consider releases from contingency tables in the form of marginal counts and observed conditional frequencies. We link data utility to log-linear models, and evaluation of disclosure risk to bo...

متن کامل

Assessing the Risk of Disclosure of Confidential Categorical Data

Disclosure limitation involves the application of statistical tools to limit the identification of information on individuals (and enterprises) included as part of statistical data bases such as censuses and sample surveys. We outline the major issues involved in assessing disclosure risk and assuring the protection of confidentiality for data bases, especially those in the form of multi-way co...

متن کامل

Partial Information Releases for Confidential Contingency Table Entries: Present and Future Research Efforts

Tabular data have been a staple product for disseminating information derived from the confidential microdata that fuel social science research and inform policy decisions. This paper outlines recent results on disclosure risk assessment associated with the release of high-dimensional contingency tables, and discusses some related research problems. The main focus is the partial information rel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006